20 research outputs found

    Containment of Shape Expression Schemas for RDF

    Get PDF
    We study the problem of containment for shape expression schemas (ShEx) for RDF graphs. We identify a subclass of ShEx that has a natural graphical representation in the form of shape graphs and their semantics is captured with a tractable notion of embedding of an RDF graph in a shape graph. When applied to pairs of shape graphs, an embedding is a sufficient condition for containment, and for a practical subclass of deterministic shape graphs, it is also a necessary one, thus yielding a subclass with tractable containment. While for general shape graphs a minimal counter-example i.e., an instance proving non-containment, might be of exponential size, we show that containment is EXP-hard and in coNEXP. Finally, we show that containment for arbitrary ShEx is coNEXP-hard and in coTwoNEXP^NP

    Management of Inconsistencies in Data Integration

    Get PDF
    Data integration aims at providing a unified view over data coming from various sources. One of the most challenging tasks for data integration is handling the inconsistencies that appear in the integrated data in an efficient and effective manner. In this chapter, we provide a survey on techniques introduced for handling inconsistencies in data integration, focusing on two groups. The first group contains techniques for computing consistent query answers, and includes mechanisms for the compact representation of repairs, query rewriting, and logic programs. The second group contains techniques focusing on the resolution of inconsistencies. This includes methodologies for computing similarity between atomic values as well as similarity between groups of data, collective techniques, scaling to large datasets, and dealing with uncertainty that is related to inconsistencies

    RDF Graph Alignment with Bisimulation

    Get PDF
    We investigate the problem of aligning two RDF databases, an essential problem in understanding the evolution of ontologies. Our approaches address three fundamental challenges: 1) the use of "blank" (null) names, 2) ontology changes in which different names are used to identify the same entity, and 3) small changes in the data values as well as small changes in the graph structure of the RDF database. We propose approaches inspired by the classical notion of graph bisimulation and extend them to capture the natural metrics of edit distance on the data values and the graph structure. We evaluate our methods on three evolving curated data sets. Overall, our results show that the proposed methods perform well and are scalable

    Learning Schemas for Unordered XML

    Get PDF
    We consider unordered XML, where the relative order among siblings is ignored, and we investigate the problem of learning schemas from examples given by the user. We focus on the schema formalisms proposed in [10]: disjunctive multiplicity schemas (DMS) and its restriction, disjunction-free multiplicity schemas (MS). A learning algorithm takes as input a set of XML documents which must satisfy the schema (i.e., positive examples) and a set of XML documents which must not satisfy the schema (i.e., negative examples), and returns a schema consistent with the examples. We investigate a learning framework inspired by Gold [18], where a learning algorithm should be sound i.e., always return a schema consistent with the examples given by the user, and complete i.e., able to produce every schema with a sufficiently rich set of examples. Additionally, the algorithm should be efficient i.e., polynomial in the size of the input. We prove that the DMS are learnable from positive examples only, but they are not learnable when we also allow negative examples. Moreover, we show that the MS are learnable in the presence of positive examples only, and also in the presence of both positive and negative examples. Furthermore, for the learnable cases, the proposed learning algorithms return minimal schemas consistent with the examples.Comment: Proceedings of the 14th International Symposium on Database Programming Languages (DBPL 2013), August 30, 2013, Riva del Garda, Trento, Ital

    Characterizing XML Twig Queries with Examples

    Get PDF
    International audienceTypically, a (Boolean) query is a finite formula that defines a possibly infinite set of database instances that satisfy it (positive examples), and implicitly, the set of instances that do not satisfy the query (negative examples). We investigate the following natural question: for a given class of queries, is it possible to characterize every query with a finite set of positive and negative examples that no other query is consistent with.We study this question for twig queries and XML databases. We show that while twig queries are characterizable, they generally require exponential sets of examples. Consequently, we focus on a practical subclass of anchored twig queries and show that not only are they characterizable but also with polynomially-sized sets of examples. This result is obtained with the use of generalization operations on twig queries, whose application to an anchored twig query yields a properly contained and minimally different query. Our results illustrate further interesting and strong connections between the structure and the semantics of anchored twig queries that the class of arbitrary twig queries does not enjoy. Finally, we show that the class of unions of twig queries is not characterizable

    Bounded repairability for regular tree languages

    Get PDF
    International audienceWe consider the problem of repairing unranked trees (e.g., XML documents) satisfying a given restriction specification R (e.g., a DTD) into unranked trees satisfying a given target specification T. Specifically, we focus on the question of whether one can get from any tree in a regular language R to some tree in another regular language T with a finite, uniformly bounded, number of edit operations (i.e., deletions and insertions of nodes). We give effective characterizations of the pairs of specifications R and T for which such a uniform bound exists, and we study the complexity of the problem under different representations of the regular tree languages (e.g., non-deterministic stepwise automata, deterministic stepwise automata, DTDs). Finally, we point out some connections with the analogous problem for regular languages of words

    Interactive Inference of Join Queries

    Get PDF
    We investigate the problem of inferring join queries from user interactions. The user is presented with a set of candidate tuples and is asked to label them as positive or negative depending on whether she would like the tuples as part of the join result or not. The goal is to quickly infer an arbitrary n-ary join predicate across two relations by keeping the number of user interactions as minimal as possible. We assume no prior knowledge of the integrity constraints between the involved relations. This kind of scenario occurs in several application settings, such as data integration, reverse engineering of database queries, and constraint inference. In such a scenario, the user has either a partial knowledge of the database schemas or the database instances are too big to be skimmed. We explore the search space by using a set of strategies that let us prune what we call “uninformative” tuples, and directly present to the user the informative ones i.e., those that lead to quickly find the goal query that the user has in mind. In this paper, we focus on the inference of joins with equality predicates and we show that for such joins deciding whether a tuple is uninformative can be done in polynomial time. Next, we propose several strategies for presenting tuples to the user in a given order that lets minimize the number of interactions. We show the efficiency and scalability of our approach through an experimental study on both benchmark and synthetic datasets. Finally, we prove that adding projection to our queries makes the problem intractable. 1

    Learning Join Queries from User Examples

    Get PDF
    International audienceWe investigate the problem of learning join queries from user examples. The user is presented with a set of candidate tuples and is asked to label them as positive or negative examples, depending on whether or not she would like the tuples as part of the join result. The goal is to quickly infer an arbitrary n-ary join predicate across an arbitrary number m of relations while keeping the number of user interactions as minimal as possible. We assume no prior knowledge of the integrity constraints across the involved relations. Inferring the join predicate across multiple relations when the referential constraints are unknown may occur in several applications, such as data integration, reverse engineering of database queries, and schema inference. In such scenarios, the number of tuples involved in the join is typically large. We introduce a set of strategies that let us inspect the search space and aggressively prune what we call uninformative tuples, and we directly present to the user the informative ones that is, those that allow the user to quickly find the goal query she has in mind. In this article, we focus on the inference of joins with equality predicates and also allow disjunctive join predicates and projection in the queries. We precisely characterize the frontier between tractability and intractability for the following problems of interest in these settings: consistency checking, learnability, and deciding the informativeness of a tuple. Next, we propose several strategies for presenting tuples to the user in a given order that allows minimization of the number of interactions. We show the efficiency of our approach through an experimental study on both benchmark and synthetic datasets

    Simple Schemas for Unordered XML

    Get PDF
    We consider unordered XML, where the relative order among siblings is ignored, and propose two simple yet practical schema formalisms: disjunctive multiplicity schemas (DMS), and its restriction, disjunction-free multiplicity schemas (MS). We investigate their computational properties and characterize the complexity of the following static analysis problems: schema satisfiability, membership of a tree to the language of a schema, schema containment, twig query satisfiability, implication, and containment in the presence of schema. Our research indicates that the proposed formalisms retain much of the expressiveness of DTDs without an increase in computational complexity. 1

    Interactive Join Query Inference with JIM

    Get PDF
    National audienceSpecifying join predicates may become a cumbersome task in many situations e.g., when the relations to be joined come from disparate data sources, when the values of the attributes carry little or no knowledge of metadata, or simply when the user is unfamiliar with querying formalisms. Such task is recurrent in many traditional data management applications, such as data integration, constraint inference, and database denormalization, but it is also becoming pivotal in novel crowdsourcing applications. We present Jim (Join Inference Machine), a system for interactive join specification tasks, where the user infers an n-ary join predicate by selecting tuples that are part of the join result via Boolean membership queries. The user can label tuples as positive or negative, while the system allows to identify and gray out the uninformative tuples i.e., those that do not add any information to the final learning goal. The tool also guides the user to reach her join inference goal with a minimal number of interactions
    corecore